How Much Domain Data Should Be in Provenance Databases?

نویسندگان

  • Daniel de Oliveira
  • Vítor Silva Sousa
  • Marta Mattoso
چکیده

Provenance databases are an important asset in data analytics of large-scale scientific data. The data derivation path allows for identifying parameters, files and domain data values of interest. In scientific workflows, provenance data is automatically captured by workflow systems. However, the power of provenance data analyses depends on the expressiveness of domain-specific data along the provenance traces. While much has been done through the W3C PROV initiative and its PROV-DM to represent generic provenance data, representing domain-specific data in provenance traces has received little attention, yet it accounts for a large number of provenance analytical queries. Such queries are based on selections on data values from input/output artifacts along workflow activities. There are several problems in modeling and capturing values from domain-specific attributes, some of them are related to managing provenance granularity, others to addressing data values hidden inside files and representing the semantics of domain data. In this work, we discuss these open issues and propose some alternatives to domain-specific provenance data capture, representation, storage and queries. Addressing these issues may be decisive in using provenance to drive scientific data analyses at large-scale.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improv: Flexible Data Provenance for Relational Databases

Curated databases, which consist of data extracted from original sources, printed articles, and other databases, are a valuable source of data for scientists. However, as curated databases aggregate information from multiple sources, the origin of the data elements can be lost. Because of this, curated databases often provide support for data annotations, which are pieces of extra information a...

متن کامل

Provenance and Probabilities in Relational Databases: From Theory to Practice

We review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation...

متن کامل

Propagation and Provenance of Probabilistic and Interval Uncertainty in Cyberinfrastructure-Related Data Processing and Data Fusion

In the past, communications were much slower than computations. As a result, researchers and practitioners collected different data into huge databases located at a single location such as NASA and US Geological Survey. At present, communications are so much faster that it is possible to keep different databases at different locations, and automatically select, transform, and collect relevant d...

متن کامل

Provenance Traces

Provenance is information about the origin, derivation, ownership, or history of an object. It has recently been studied extensively in scientific databases and other settings due to its importance in helping scientists judge data validity, quality and integrity. However, most models of provenance have been stated as ad hoc definitions motivated by informal concepts such as “comes from”, “influ...

متن کامل

Deciding How to Store Provenance

Provenance of a file is metadata pertaining to the history of the file. Provenance, unlike normal metadata stored in file systems, is retrieved primarily by running queries. This implies that provenance has to be indexed and should have a query interface. We believe that databases are the most appropriate place to store provenance as they provide both indexing and query capabilities. The goal o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015